Search CORE

45 research outputs found

LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection

Author: Jaidka Kokil
Nasir Ahmad
Sharma Aadish
Publication venue
Publication date: 29/10/2023
Field of study

This paper compares different pre-trained and fine-tuned large language models (LLMs) for hate speech detection. Our research underscores challenges in LLMs' cross-domain validity and overfitting risks. Through evaluations, we highlight the need for fine-tuned models that grasp the nuances of hate speech through greater label heterogeneity. We conclude with a vision for the future of hate speech detection, emphasizing cross-domain generalizability and appropriate benchmarking practices.Comment: 9 pages, 3 figures, 4 table

arXiv.org e-Print Archive

It Takes Two to Negotiate: Modeling Social Exchange in Online Multiplayer Games

Author: Ahuja Hansin
Jaidka Kokil
Ng Lynnette
Publication venue
Publication date: 14/11/2023
Field of study

Online games are dynamic environments where players interact with each other, which offers a rich setting for understanding how players negotiate their way through the game to an ultimate victory. This work studies online player interactions during the turn-based strategy game, Diplomacy. We annotated a dataset of over 10,000 chat messages for different negotiation strategies and empirically examined their importance in predicting long- and short-term game outcomes. Although negotiation strategies can be predicted reasonably accurately through the linguistic modeling of the chat messages, more is needed for predicting short-term outcomes such as trustworthiness. On the other hand, they are essential in graph-aware reinforcement learning approaches to predict long-term outcomes, such as a player's success, based on their prior negotiation history. We close with a discussion of the implications and impact of our work. The dataset is available at https://github.com/kj2013/claff-diplomacy.Comment: 28 pages, 11 figures. Accepted to CSCW '24 and forthcoming the Proceedings of ACM HCI '2

arXiv.org e-Print Archive

Social Media and Electoral Predictions: A Meta-Analytic Review

Author: Jaidka , Kokil
Liu , Jing
Skoric , Marko
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2019
Field of study

Can social media data be used to make reasonably accurate estimates of electoral outcomes? We conducted a meta-analytic review to examine the predictive performance of different features of social media posts and different methods in predicting political elections: (1) content features; and (2) structural features. Across 45 published studies, we find significant variance in the quality of predictions, which on average still lag behind those in traditional survey research. More specifically, our findings that machine learning-based approaches generally outperform lexicon-based analyses, while combining structural and content features yields most accurate predictions

Crossref

AIS Electronic Library (AISeL)

Understanding and Measuring Psychological Stress using Social Media

Author: Buffone Anneke
Eichstaedt Johannes
Guntuku Sharath Chandra
Jaidka Kokil
Ungar Lyle
Publication venue
Publication date: 04/04/2019
Field of study

A body of literature has demonstrated that users' mental health conditions, such as depression and anxiety, can be predicted from their social media language. There is still a gap in the scientific understanding of how psychological stress is expressed on social media. Stress is one of the primary underlying causes and correlates of chronic physical illnesses and mental health conditions. In this paper, we explore the language of psychological stress with a dataset of 601 social media users, who answered the Perceived Stress Scale questionnaire and also consented to share their Facebook and Twitter data. Firstly, we find that stressed users post about exhaustion, losing control, increased self-focus and physical pain as compared to posts about breakfast, family-time, and travel by users who are not stressed. Secondly, we find that Facebook language is more predictive of stress than Twitter language. Thirdly, we demonstrate how the language based models thus developed can be adapted and be scaled to measure county-level trends. Since county-level language is easily available on Twitter using the Streaming API, we explore multiple domain adaptation algorithms to adapt user-level Facebook models to Twitter language. We find that domain-adapted and scaled social media-based measurements of stress outperform sociodemographic variables (age, gender, race, education, and income), against ground-truth survey-based stress measurements, both at the user- and the county-level in the U.S. Twitter language that scores higher in stress is also predictive of poorer health, less access to facilities and lower socioeconomic status in counties. We conclude with a discussion of the implications of using social media as a new tool for monitoring stress levels of both individuals and counties.Comment: Accepted for publication in the proceedings of ICWSM 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Literature review writing: a study of information selection from cited papers / Kokil Jaidka, Christopher Khoo and Jin-Cheon Na

Author: Jaidka Kokil Jaidka
Khoo Christopher Khoo
Na Jin-Cheon Na
Publication venue
Publication date: 01/01/2011
Field of study

This paper reports the results of a small study of how researchers select and edit research information from cited papers to include in a literature review. This is part of a bigger content analysis and linguistic analysis of literature reviews. This study aims to answer the following questions: where do authors select information from the cited papers (e.g., Abstract, Introduction, Conclusion section, etc.)? What types of information do they select (e.g., research objectives, results, etc.), and How do they transform that information (e.g., paraphrasing, cut-pasting, etc.)? In order to answer these questions, we analyzed the literature review section of 20 articles from the Journal of the American Society for Information Science & Technology, 2001-2008, to answer these questions. Referencing sentences were mapped to source papers to determine their origin. Other features of the source information were also annotated, such as the type of information selected and the types of editing changes made to it before including into the literature review. Preliminary results indicate that authors prefer to select information from the Abstract, Introduction and Conclusion sections of the cited papers. This information is transformed through cut-paste, paraphrase or higher-level semantic transformations to describe the research objective, methodology and results of the referenced study. The choices made in selecting and transforming the source information appeared to be related to the two styles of literature review finally constructed – integrative and descriptive literature reviews. Keywords: Literature reviews; Multi-document summarization; Information science; Information extraction; Information selection

Universiti Teknologi MARA Institutional Repository

Predicting Sentence-Level Factuality of News and Bias of Media Outlets

Author: Benevenuto Fabrício
Jaidka Kokil
Pardo Thiago A. S.
Vargas Francielle
Publication venue
Publication date: 26/04/2023
Field of study

Predicting the factuality of news reporting and bias of media outlets is surely relevant for automated news credibility and fact-checking. While prior work has focused on the veracity of news, we propose a fine-grained reliability analysis of the entire media. Specifically, we study the prediction of sentence-level factuality of news reporting and bias of media outlets, which may explain more accurately the overall reliability of the entire source. We first manually produced a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions from AllSides. As a result, baseline models for sentence-level factuality prediction were presented by fine-tuning BERT. Finally, due to the severity of fake news and political polarization in Brazil, both dataset and baseline were proposed for Portuguese. However, our approach may be applied to any other language

arXiv.org e-Print Archive

Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

Author: Assenmacher Dennis
Brantner Cornelia
Garcia David
Jaidka Kokil
Joseph Kenneth
Lasser Jana
Mashhadi Afra
Matter Daniel
Morstatter Fred
Otterbacher Jahna
Pfeffer Juergen
Romero Daniel M.
Schwemmer Carsten
Varol Onur
Wu Siqi
Yang Diyi
Publication venue
Publication date: 11/04/2023
Field of study

At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected all 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change

arXiv.org e-Print Archive

Predicting elections from social media: a three-country, three-method comparative study

Author: Jaidka Kokil,
Publication venue
Publication date: 19/02/2019
Field of study

Ezid